Trear: Transformer-Based RGB-D Egocentric Action Recognition
نویسندگان
چکیده
In this article, we propose a transformer-based RGB-D egocentric action recognition framework, called Trear. It consists of two modules: 1) interframe attention encoder and 2) mutual-attentional fusion block. Instead using optical flow or recurrent units, adopt self-attention mechanism to model the temporal structure data from different modalities. Input frames are cropped randomly mitigate effect redundancy. Features each modality interacted through proposed block combined simple yet effective operation produce joint representation. Empirical experiments on large sets: THU-READ first-person hand action, one small set, wearable computer vision systems, have shown that method outperforms state-of-the-art results by margin.
منابع مشابه
Machine for RGB - D Action Recognition
Bilinear Heterogeneous Information Machine for RGB-D Action Recognition Report Title This paper proposes a novel approach to action recognition from RGB-D cameras, in which depth features and RGB visual features are jointly used. Rich heterogeneous RGB and depth data are effectively compressed and projected to a learned shared space, in order to reduce noise and capture useful information for r...
متن کاملRGB-D-based action recognition datasets: A survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation ag...
متن کاملBeyond Action Recognition: Action Completion in RGB-D Data
An action is completed when its goal has been successfully achieved. Using current state-of-the-art depth features, designed primarily for action recognition, an incomplete sequence may still be classified as its complete counterpart due to the overlap in evidence. In this work we show that while features can perform comparably for action recognition, they vary in their ability to recognise inc...
متن کاملImproved RGB-D-T based face recognition
Reliable facial recognition systems are of crucial importance in various applications from entertainment to security. Thanks to the deep-learning concepts introduced in the field, a significant improvement in the performance of the unimodal facial recognition systems has been observed in the recent years. At the same time a multimodal facial recognition is a promising approach. This paper combi...
متن کاملViewpoint Invariant Action Recognition using RGB-D Videos
In video-based action recognition, viewpoint variations often pose major challenges because the same actions can appear different from different views. We use the complementary RGB and Depth information from the RGB-D cameras to address this problem. The proposed technique capitalizes on the spatiotemporal information available in the two data streams to the extract action features that are lar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Cognitive and Developmental Systems
سال: 2022
ISSN: ['2379-8920', '2379-8939']
DOI: https://doi.org/10.1109/tcds.2020.3048883